14 research outputs found

    STDnet: Exploiting high resolution feature maps for small object detection

    Get PDF
    The accuracy of small object detection with convolutional neural networks (ConvNets) lags behind that of larger objects. This can be observed in popular contests like MS COCO. This is in part caused by the lack of specific architectures and datasets with a sufficiently large number of small objects. Our work aims at these two issues. First, this paper introduces STDnet, a convolutional neural network focused on the detection of small objects that we defined as those under 16 × 16 pixels. The high performance of STDnet is built on a novel early visual attention mechanism, called Region Context Network (RCN), to choose the most promising regions, while discarding the rest of the input image. Processing only specific areas allows STDnet to keep high resolution feature maps in deeper layers providing low memory overhead and higher frame rates. High resolution feature maps were proved to be key to increasing localization accuracy in such small objects. Second, we also present USC-GRAD-STDdb, a video dataset with more than 56,000 annotated small objects in challenging scenarios. Experimental results over USC-GRAD-STDdb show that STDnet improves the [email protected] of the best state-of-the-art object detectors for small target detection from 50.8% to 57.4%. Performance has also been tested in MS COCO for objects under 16 × 16 pixels. In addition, a spatio-temporal baseline network, STDnet-bST, has been proposed to make use of the information of successive frames, increasing the [email protected] of STDnet in 2.3%. Finally, optimizations have been carried out to be fit on embedded devices such as Jetson TX2This research was funded by Gradiant, Spain, and also partially funded by the Spanish Ministry of Economy and Competitiveness under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32 (MICINN), and the Galician Ministry of Education, Culture and Universities, Spain under grant ED431G/08. Brais Bosquet is supported by the Galician Ministry of Education, Culture and Universities, Spain . These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)S

    Real-time siamese multiple object tracker with enhanced proposals

    Get PDF
    Maintaining the identity of multiple objects in real-time video is a challenging task, as it is not always feasible to run a detector on every frame. Thus, motion estimation systems are often employed, which either do not scale well with the number of targets or produce features with limited semantic information. To solve the aforementioned problems and allow the tracking of dozens of arbitrary objects in real-time, we propose SiamMOTION. SiamMOTION includes a novel proposal engine that produces quality features through an attention mechanism and a region-of-interest extractor fed by an inertia module and powered by a feature pyramid network. Finally, the extracted tensors enter a comparison head that efficiently matches pairs of exemplars and search areas, generating quality predictions via a pairwise depthwise region proposal network and a multi-object penalization module. SiamMOTION has been validated on five public benchmarks, achieving leading performance against current state-of-the-art trackers. Code available at: https://www.github.com/lorenzovaquero/SiamMOTIONThis research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04]. These grants are co-funded by the European Regional Development Fund (ERDF). Lorenzo Vaquero is supported by the Spanish Ministerio de Universidades under the FPU national plan (FPU18/03174). We also gratefully acknowledge the support of NVIDIA Corporation for hardware donations used for this researchS

    Tracking more than 100 arbitrary objects at 25 FPS through deep learning

    Get PDF
    Most video analytics applications rely on object detectors to localize objects in frames. However, when real-time is a requirement, running the detector at all the frames is usually not possible. This is somewhat circumvented by instantiating visual object trackers between detector calls, but this does not scale with the number of objects. To tackle this problem, we present SiamMT, a new deep learning multiple visual object tracking solution that applies single-object tracking principles to multiple arbitrary objects in real-time. To achieve this, SiamMT reuses feature computations, implements a novel crop-and-resize operator, and defines a new and efficient pairwise similarity operator. SiamMT naturally scales up to several dozens of targets, reaching 25 fps with 122 simultaneous objects for VGA videos, or up to 100 simultaneous objects in HD720 video. SiamMT has been validated on five large real-time benchmarks, achieving leading performance against current state-of-the-art trackersThis research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2017/69, accreditation 2016–2019, ED431G/08]. These grants are co-funded by the European Regional Development Fund (ERDF). Lorenzo Vaquero is supported by the Spanish Ministerio de Universidades under the FPU national plan (FPU18/03174)S

    Short-term anchor linking and long-term self-guided attention for video object detection

    Get PDF
    We present a new network architecture able to take advantage of spatio-temporal information available in videos to boost object detection precision. First, box features are associated and aggregated by linking proposals that come from the same anchor box in the nearby frames. Then, we design a new attention module that aggregates short-term enhanced box features to exploit long-term spatio-temporal information. This module takes advantage of geometrical features in the long-term for the first time in the video object detection domain. Finally, a spatio-temporal double head is fed with both spatial information from the reference frame and the aggregated information that takes into account the short- and long-term temporal context. We have tested our proposal in five video object detection datasets with very different characteristics, in order to prove its robustness in a wide number of scenarios. Non-parametric statistical tests show that our approach outperforms the state-of-the-art. Our code is available at https://github.com/daniel-cores/SLTnetThis research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016-2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)S

    Efficient edge filtering of directly-follows graphs for process mining

    Get PDF
    Automated process discovery is a process mining operation that takes as input an event log of a business process and generates a diagrammatic representation of the process. In this setting, a common diagrammatic representation generated by commercial tools is the directly-follows graph (DFG). In some real-life scenarios, the DFG of an event log contains hundreds of edges, hindering its understandability. To overcome this shortcoming, process mining tools generally offer the possibility of filtering the edges in the DFG. We study the problem of efficiently filtering the DFG extracted from an event log while retaining the most frequent relations. We formalize this problem as an optimization problem, specifically, the problem of finding a sound spanning subgraph of a DFG with a minimal number of edges and a maximal sum of edge frequencies. We show that this problem is an instance of an NP-hard problem and outline several polynomial-time heuristics to compute approximate solutions. Finally, we report on an evaluation of the efficiency and optimality of the proposed heuristics using 13 real-life event logsWe thank Luciano García-Baíuelos for proposing the idea of combining the results of Chu-Liu-Edmonds’ algorithm to filter a DFG. We also thank Adriano Augusto for providing us with the implementation of the Split Miner filtering technique. This research was funded by the Spanish Ministry of Economy and Competitiveness (TIN2017-84796-C2-1-R) and the Galician Ministry of Education, Culture and Universities (ED431G/08). These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program). D. Chapela-Campa is supported by the Spanish Ministry of Education, under the FPU national plan (FPU16/04428 and EST19/00135). This research is also funded by the Estonian Research Council (grant PRG1226)S

    Real-time visual detection and tracking system for traffic monitoring

    Get PDF
    Computer vision systems for traffic monitoring represent an essential tool for a broad range of traffic surveillance applications. Two of the most noteworthy challenges for these systems are the real-time operation with hundreds of vehicles and the total occlusions which hinder the tracking of the vehicles. In this paper, we present a traffic monitoring approach that deals with these two challenges based on three modules: detection, tracking and data association. First, vehicles are identified through a deep learning based detector. Second, tracking is performed with a combination of a Discriminative Correlation Filter and a Kalman Filter. This permits to estimate the tracking error in order to make tracking more robust and reliable. Finally, the data association through the Hungarian algorithm combines the information of the previous steps. The contributions are: (i) a real-time traffic monitoring system robust to occlusions that can process more than four hundred vehicles simultaneously; and (ii) the application of the system to anomaly detection in traffic and roundabout input/output analysis. The system has been evaluated with more than two thousand vehicles in real-life videosThis research was partially funded by the Spanish Ministry of Science and Innovation under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grant ED431G/08. Mauro Fernández is supported by the Spanish Ministry of Economy and Competitiveness under grant BES-2015-071889. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)S

    Autonomous navigation for UAVs managing motion and sensing uncertainty

    Get PDF
    We present a motion planner for the autonomous navigation of UAVs that manages motion and sensing uncertainty at planning time. By doing so, optimal paths in terms of probability of collision, traversal time and uncertainty are obtained. Moreover, our approach takes into account the real dimensions of the UAV in order to reliably estimate the probability of collision from the predicted uncertainty. The motion planner relies on a graduated fidelity state lattice and a novel multi-resolution heuristic which adapt to the obstacles in the map. This allows managing the uncertainty at planning time and yet obtaining solutions fast enough to control the UAV in real time. Experimental results show the reliability and the efficiency of our approach in different real environments and with different motion models. Finally, we also report planning results for the reconstruction of 3D scenarios, showing that with our approach the UAV can obtain a precise 3D model autonomouslyThis research was funded by the Spanish Ministry for Science, Innovation, Spain and Universities (grant TIN2017-84796-C2-1-R) and the Galician Ministry of Education, University and Professional Training, Spain (grants ED431C 2018/29 and “accreditation 2016–2019, ED431G/08”). These grants were co-funded by the European Regional Development Fund (ERDF/FEDER program)S

    A full data augmentation pipeline for small object detection based on generative adversarial networks

    Get PDF
    Object detection accuracy on small objects, i.e., objects under 32 32 pixels, lags behind that of large ones. To address this issue, innovative architectures have been designed and new datasets have been released. Still, the number of small objects in many datasets does not suffice for training. The advent of the generative adversarial networks (GANs) opens up a new data augmentation possibility for training architectures without the costly task of annotating huge datasets for small objects. In this paper, we propose a full pipeline for data augmentation for small object detection which combines a GAN-based object generator with techniques of object segmentation, image inpainting, and image blending to achieve high-quality synthetic data. The main component of our pipeline is DS-GAN, a novel GAN-based architecture that generates realistic small objects from larger ones. Experimental results show that our overall data augmentation method improves the performance of state-of-the-art models up to 11.9% AP on UAVDT and by 4.7% AP on iSAID, both for the small objects subset and for a scenario where the number of training instances is limited.This research was partially funded by the Spanish Ministerio de Ciencia e Innovación [grant numbers PID2020-112623GB-I00, RTI2018-097088-B-C32], and the Galician Consellería de Cultura, Educación e Universidade [grant numbers ED431C 2018/29, ED431C 2021/048, ED431G 2019/04]. These grants are co-funded by the European Regional Development Fund (ERDF). This paper was supported by European Union’s Horizon 2020 research and innovation programme under grant numberS

    A FastSLAM-based algorithm for omnidirectional cameras

    Get PDF
    Environments with a low density of landmarks are difficult for vision-based Simultaneous Localization and Mapping (SLAM) algorithms. The use of omnidirectional cameras, which have a wide field of view, is specially interesting in these environments as several landmarks are usually detected in each image. A typical example of this kind of situation happens in indoor environments when the lights placed on the ceiling are the landmarks. The use of omnivision combined with this type of landmarks presents two challenges: the data association and the initialization of the landmarks with a bearing-only sensor. In this paper we present a SLAM algorithm based on the wellknown FastSLAM approach. The proposal includes a novel hierarchical data association method based on the Hungarian algorithm, and a delayed initialization of the landmarks. The approach has been tested on a real environment with a Pioneer 3-DX robot.This work was supported by the Spanish Ministry of Economy and Competitiveness under grants TIN2011-22935 and TIN2009-07737 and by the Galician Government (Consolidation of Competitive Research Groups, Xunta de Galicia ref. 2010/6). Manuel Mucientes is supported by the Ramón y Cajal program of the Spanish Ministry of Economy and Competitiveness

    STDnet-ST: Spatio-temporal ConvNet for small object detection

    No full text
    Object detection through convolutional neural networks is reaching unprecedented levels of precision. However, a detailed analysis of the results shows that the accuracy in the detection of small objects is still far from being satisfactory. A recent trend that will likely improve the overall object detection success is to use the spatial information operating alongside temporal video information. This paper introduces STDnet-ST, an end-to-end spatio-temporal convolutional neural network for small object detection in video. We define small as those objects under px, where the features become less distinctive. STDnet-ST is an architecture that detects small objects over time and correlates pairs of the top-ranked regions with the highest likelihood of containing those small objects. This permits to link the small objects across the time as tubelets. Furthermore, we propose a procedure to dismiss unprofitable object links in order to provide high quality tubelets, increasing the accuracy. STDnet-ST is evaluated on the publicly accessible USC-GRAD-STDdb, UAVDT and VisDrone2019-VID video datasets, where it achieves state-of-the-art results for small objectsThis research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016-2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program)S
    corecore